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Smallest small- world network 
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Efficiency in passage times is an important issue in designing networks, such as transportation 
or computer networks. The small-world networks have structures that yield high efficiency, while 
keeping the network highly clustered. We show that among all networks with the small-world 
structure, the most efficient ones have a single "center", from which all shortcuts are connected to 
uniformly distributed nodes over the network. The networks with several centers and a connected 
subnetwork of shortcuts are shown to be "almost" as efficient. Genetic-algorithm simulations further 
support our results. 

PACS numbers: 89.75.Hc, 45.10.Db, 89.20.Hh 



The small-world network models have received much 
attention from researchers in various disciplines, since 
they were introduced by Watts and Strogatz JD as models 
of real networks that lie somewhere between being ran- 
dom and being regular. Small-world networks are char- 
acterized by two numbers: the average path length L 
and the clustering coefficient C. L, which measures ef- 
ficiency of communication or passage time between 
nodes, is defined as being the average number of links in 
the shortest path between a pair of nodes in the network. 
C represents the degree of local order, and is defined as 
being the probability that two nodes connected to a com- 
mon node are also connected to each other. 

Many real networks are sparse in the sense that the 
number of links in the network is much less than N(N — 
l)/2, the number of all possible (bidirectional) links. On 
one hand, random sparse networks have short average 
path length (i.e., L ~ log AT), but they are poorly clus- 
tered (i.e., C < 1). On the other hand, regular sparse 
networks are typically highly clustered, but L is compa- 
rable to N. (All-to-all networks have C = 1 and L = 1, 
so they are most efficient, but most expensive in the sense 
that they have all N(N — l)/2 possible connections and 
so they are dense rather than sparse.) The small- world 
network models have advantages of both random and reg- 
ular sparse networks: they have small L for fast commu- 
nication between nodes, and they have large C, ensur- 
ing sufficient redundancy for high fault tolerance. Many 
networks in the real world, such as the world-wide web 
(WWW) H] , the neural network of C. elegans fl], f§ , col- 
laboration networks of actors Jl|, || , networks of scientific 
collaboration JjJ , and the metabolic network of E. coli || , 
have been shown to have this property. The models of 
small- world networks are constructed from a regular lat- 
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tice by adding a relatively small number of shortcuts at 
random, where a link between two nodes u and v is called 
a shortcut if the shortest path length between u and v 
in the absence of the link is more than two Q . The reg- 
ularity of the underlying lattice ensures high clustering, 
while the shortcuts reduce the size of L. 

Most work has focused on average properties of such 
models over different realizations of random shortcut con- 
figurations. However, a different point of view is neces- 
sary when a network is to be designed to optimize its 
performance with a restricted number of long-range con- 
nections. For example, a transportation network should 
be designed to have the smallest L possible, so as to max- 
imize the ability of the network to transport people ef- 
ficiently, while keeping a reasonable cost of building the 
network. The same can be said about communication 
networks for efficient exchange of information between 
nodes. We fix the number of shortcuts here and as a 
result the clustering coefficient C for any configuration 
of shortcuts is approximately as high as that of the un- 
derlying lattice. The problem we address in this paper 
is: given a number of shortcuts in a small-world network, 
which configuration of these shortcuts minimizes L ? Q . 

Most random choices of shortcuts result in a subop- 
timal configuration, since they do not have any special 
structures or organizations. On the contrary, many real 
networks have highly structured configurations of short- 
cuts. For example, in long-range transportation net- 
works, the airline connections between major cities which 
can be regarded as shortcuts, are far from being random, 
but they are organized around hubs. Efficient travel in- 
volves ground transportation to a nearest airport, then 
flights through a hub to an airport closest to the desti- 
nation, and ground transportation again at the end. 

In the following, we show that the average path length 
L of a small-world network with a fixed number of short- 
cuts attains its minimum value when there exists a "cen- 
ter" node, from which all shortcuts are connected to uni- 
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FIG. 1: Examples of shortcut configuration with (a) a single 
center and (b) two centers. 
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FIG. 2: (a) Configuration with one shortcut disconnected 
from the rest of the subnetwork of shortcuts, (b) Various 
configuration of shortcuts with m = 6 shortcuts. 



formly distributed nodes in the network fq] . An example 
of such a configuration is illustrated in Fig. |l|(a). We also 
show that if a small-world network has several "centers" 
and its subnetwork of shortcuts is connected, then L is al- 
most as small as the minimum value. An example of such 
configuration is shown in Fig. |l](b). We then derive an 
explicit formula for the minimum average path length in 
the case of the small-world network models constructed 
from a one-dimensional lattice by adding a fixed number 
of shortcuts. Finally, we verify the results by performing 
genetic- algorithm simulations for minimizing L. 

Our general argument proceeds as follows. A small- 
world network is composed of two parts: the underlying 
network (e.g., a regular lattice) and the subnetwork of 
shortcuts containing only the shortcuts and their nodes. 
Let m denote the number of shortcuts. First, for L to 
be as short as possible, the subnetwork of shortcuts must 
be connected. This connectivity is unlikely to happen if 
the shortcuts are chosen at random, since the network 
is sparse. Indeed, the probability is less than ml/N' m ~ 1 , 
where N is the number of nodes in the network. For 
example, for N = 1000 and m — 10, the probability 
is smaller than 10~ 22 . Having a disconnected compo- 
nent in the subnetwork of shortcuts increases the value 
of L. In particular, consider the configuration of short- 
cuts as shown in Fig. ||(a) , where one of the shortcuts in 
Fig. |l|(a) is disconnected from the rest of the subnetwork 
of shortcuts. If the shortest path between a pair of nodes 
involves going from the disconnected shortcut to the rest 



of the subnetwork, then its length is increased by 2 com- 
pared to the path length between the corresponding pair 
in Fig. |l|(a). This increases the average path length L. 

Next, observe that the nodes in the subnetwork of 
shortcuts must be uniformly distributed over the net- 
work. This can be seen by noting that the average length 
of the shortest path from a node to its nearest shortcut 
is smallest when these nodes are uniformly distributed. 

Finally, among all possible configurations of connected 
subnetworks of shortcuts with uniformly distributed 
nodes, ones with a single center involve the largest num- 
ber of nodes (namely, m + 1). Figure ^(b) shows some 
examples of connected subnetworks with m = 6. Ob- 
viously, increasing the number of nodes involved in the 
shortcut subnetwork reduces L, since it reduces the av- 
erage path length to the nearest shortcut node. Among 
all connected configurations of shortcuts having m + 1 
nodes, the ones having a single center give the shortest 
value for L, since the average path length of the shortcut 
subnetwork is the smallest in that case. 

These arguments indicate that given a fixed num- 
ber of shortcuts, the networks with a connected sub- 
network of shortcuts having nodes uniformly distributed 
have smaller L than a typical random configuration, and 
among those the ones with a single center minimize L. 
In other words, the "smallest" small-world networks are 
characterized by these structures. 

Now we will compute explicitly the average path 
length for a configuration with a single center in the 
case of small-world networks constructed from a one- 
dimensional lattice. Consider N nodes arranged uni- 
formly on a circle of unit circumference, where each node 
is connected to its two nearest-neighbor nodes. In ad- 
dition, consider shortcuts connecting m arbitrary pairs 
of nodes. To make the calculation simple, we take the 
continuum limit N — > oo with m fixed, in which the net- 
work becomes a continuous graph composed of a circle 
corresponding to the lattice and chords representing the 
shortcuts. Let us define the distance d(P, Q) between 
points P and Q on the continuous graph as the length of 
the shortest continuous path along the graph, regarding 
the length of a chord as zero. In other words, a short- 
cut is regarded as identifying two points on the circle, 
rather than merely connecting them. Then, the number 
of links in the shortest path between nodes P and Q in 
the original network, normalized by N, tends to d(P, Q) 
as N — > oo. This one-dimensional model, despite being 
one of the simplest models of small- world networks, cap- 
tures basic features of many real networks. In Ref. ||, a 
mean-field-type argument was used to derive an analyti- 
cal expression for an average of L over random configura- 
tions of shortcuts, which was later improved in Ref. [fL0| . 
In the following, we derive an analytical expression for 
the configuration with a single center. 

Consider the configuration of shortcuts with a center 
node connected to m other points on the circle, as shown 
in Fig. |^. The m + 1 points including the center point 
are equally spaced with £ = l/(m +1), and they divide 
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FIG. 3: The continuum limit model with configuration having 
a single center, (a) Q is in Ap, the arc containing P, and (b) 
Q is not in Ap. 



the circle into m + 1 arcs of the same length. We will 
compute the average d(P, Q) taken over all pairs (P, Q). 
Without loss of generality, we may consider P as fixed. 
Let Ap be the arc in which P lies. Suppose first that 
Q £ Ap as in Fig. ||(a). Because the end points of are 
connected to each other by two shortcuts via the center, 
the distance in Ap is equivalent to the distance on a circle 
of circumference £. Therefore, the average of d(P, Q) over 
all pairs (P, Q), such that Q £ Ap, is equal to the average 
distance between two points on a circle of circumference 
£, which is £/4. Suppose now that Q ^ Ap as in Fig. ||(b). 
Let us denote the distance from P to its closest shortcut 
connection by a, and the distance from Q to its closest 
shortcut by (3. Since the shortest path between P and 
Q must pass through two shortcuts of length zero, we 
have d(P, Q) = a + (3. Averaging this over all possible 
choices of a and j3, which can take any value between 
and £/2 independently, we obtain £/2. Noting that the 
probabilities that Q £ Ap and that Q ^ Ap are l/(m+l) 
and m/(m+l), respectively, the normalized average path 
length I can be calculated as 



I = d(P, Q) = 



1 



1 



2m + 1 
4(m + l) 2 



Let us now consider more general situation where each 
node in the network has connections to its neighboring 
nodes, up to fcth nearest neighbors. Because of the con- 
nections to fcth nearest neighbors, following the shortest 
path between nodes P and Q takes 1/fc times less steps 
compared to the case discussed above. Hence, we must 
also scale I, the normalized average path length of the 
network, by a factor 1/fc yielding 



I 



\d{P,Q) 



2m + 1 
4fc(m + l) 2 



(1) 



An important observation about Eq. ([!]) is that it can 
be written as I — f(m)/k, where f(m) is a function that 
depends only on the number of shortcuts. The formula 
derived in Ref. Q for the average l r of normalized path 
length over random configuration of shortcuts also has 
the same form with different function for /, namely, 
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FIG. 4: Normalized path length of the network as a function 
of the number m of shortcuts for k = 1. The continuous curve 
is Eq. ([[]). The circles and squares are the numerical compu- 
tation of I for the configuration with a single center and of l r 
over 10 random shortcut configurations, respectively. The in- 
set shows the ratio l r /l computed from numerical simulations 
(circles) and from theoretical results ([[]) and for N — oo 
(continuous line). N = 10 4 was used for numerical computa- 
tions. 



Note also that since the shortcuts are considered to have 
length zero, the derivation above remains correct as long 
as the subnetwork of shortcuts is connected and has uni- 
formly distributed nodes, suggesting that in the contin- 
uum limit these two conditions are sufficient to achieve 
the minimum of L. 

Figure |] compares the calculation summarized in 
Eq. (|lj) (continuous curve) with numerical computation 
of I for a single center (circles) and of l r over 10 ran- 
dom configurations of shortcuts (squares). This shows 
an excellent agreement of Eq. (Q) with the simulation. In 
fact, the error in the Eq. (nl) due to the approximation 
N — > oo is of order 1/N, mainly because the normalized 
length of a shortcut is considered to be zero rather than 
1/A^. The inset in Fig. ^ shows the ratio l r /l as a func- 
tion of the number (m) of shortcuts. Here the ratio is 
computed from numerical simulations (circles) and from 
the theoretical results (0) and (||) (continuous curve). 
Since Eq. (|l|) is valid for m <C N and Eq. (§) is valid for 
1 <§; m <g; N, the curve in the inset is exact in the limit 
N — > oo with m 3> 1 hxed. Using the asymptotic form 
l r ~ (log2m)/4m of Eq. (||) for m > 1, one sees that 
l r /l ~ logm, explaining the fact that the curve in the 
inset is almost a straight line for large m. Numerical re- 
sults in the inset indicate that the effect of finite size and 
large shortcut density actually increases the ratio, mak- 
ing the benefit of optimizing the shortcut configuration 
to a single-center model even larger than the theoretical 
prediction. 

Finally, we simulate optimization of the shortcut con- 
figuration for a one-dimensional array of nodes using 
the genetic-algorithm (GA) methodology p| . An initial 
population is described as being a collection of various 
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FIG. 5: Ten best solutions obtained by the genetic-algorithm 
simulations. The corresponding average path lengths are (a) 
L = 44.962 (b) L = 44.995, (c) L = 45.043, (d) L = 45.044, 
(e) L = 45.163, (f) L = 45.221, (g) L = 45.227, (h) L = 
45.275, (i) L = 45.283, (j) L = 45.286. N = 1000, m = 10, 
and k — 1 are used. 



shortcut configurations specified by m pairs of integers 
representing the locations of nodes connected by short- 
cuts. The fitness of each configuration is defined to be 
L _1 , where L is the average path length. A new popu- 
lation of shortcut configurations is created from the old 
one in analogy with reproduction in population genet- 
ics: a configuration is viewed as being the genome of an 
individual in the population, and in creating a new pop- 
ulation, we allow there to be one-point crossovers (i.e., 
interchanging subsets of shortcuts) and mutations (i.e., 
changes in the location of end points by Gaussian ran- 
dom numbers). This creation process is continued until 
the fitness of the best individual in the population is con- 
stant over 100 generations. This gives a candidate for the 
optimal solution. The program for the simulation was 
developed using a C++ library called GAlib Jl2[ . 

Ten best solutions (here best means having shortest av- 
erage path length) resulting from 254 independent runs 
with m = 10, k = 1, and N = 1000, and the population 
size of 100 are shown in Fig. M. First, observe that in 
each case the subnetwork of shortcuts is connected. This 
was the case in every solution found using the genetic 
algorithm. Second, in each case there are centers from 
which many shortcuts emanate. Moreover, the nodes in 
the subnetwork are approximately equally spaced around 
the circle. These observations are consistent with the ar- 
gument used above to establish our results. All solutions 
in Fig. H have the average path length within 2% of the 
average path length achieved by the single-center config- 
uration (which is 44.577). In contrast, the correspond- 
ing value for random shortcuts (« 88) is almost double 
the single-center solution. Although the single-center so- 
lution was not found by the genetic algorithm due to 
the limited number (254) of simulation runs, the results 
show that configurations with several centers are almost 
as efficient as the single-center configuration, as long as 
the subnetwork of shortcuts is connected and its nodes 
are uniformly distributed. The single-center solution was 
found for smaller networks with N — 100 and m = 5, for 
which the computation is less demanding. 



4 



(a) (b) (c) (d) (e) 




(f) (g) (h) (i) (j) 




FIG. 6: Ten best solutions from 81 independent runs of GA 
simulation with the population size of 30, N = 1000, m — 10, 
and k = 2. The corresponding average path lengths are (a) 
L = 24.309, (b) L = 24.379, (c) L = 24.622, (d) L = 24.627, 
(e) L = 24.640, (f) L = 24.650, (g) L = 24.653, (h) L = 
24.660, (i) L = 24.779, (j) L = 24.798. The average path 
length is 23.795 for the single-center configuration, while it is 
approximately 43 for random shortcuts. 



Any other values of k should lead to similar results. 
The case of k — 2 is shown in Fig. ^. In fact, due to 
the generality of the argument given earlier, we expect 
that the results can be extended to the case where the 
shortcuts are added to a lattice of higher dimension, or 
to a regular network of another type. 

The result of these simulations using the GA method- 
ology shows that design elements for efficient networks 
are (1) connectedness of the shortcut subnetwork, (2) 
uniform distribution of nodes in the subnetwork, and (3) 
existence of centers. 

We expect to see many examples of real networks with 
such structures. Our computations on the neural network 
of C. elegans (which has 285 nodes, 2347 links, and 112 
shortcuts) show that the structures are indeed present: 
(i) the shortcut subnetwork has much fewer (= 15) con- 
nected components than the average (~ 47) for randomly 
chosen shortcuts, and the size of its giant component (= 
75) is significantly larger than the average (« 12) over 
random shortcuts; (ii) most (« 88%) of the nodes are 
within one step of a shortcut; (iii) there are a few nodes 
having many shortcuts (11 shortcuts in the main cen- 
ter) . In general, a network with such structures is robust 
against random failures, although it is sensitive to de- 
liberate attacks to the centers. This property, which is 
shared by scale- free networks fl3| , is shown to charac- 
terize many real networks such as the Internet and the 
WWW 141. However, some biological networks may be 
robust even against attacks on the centers since loss of 
a center can result in shortcuts reconnecting to nearby 
nodes followed by the optimization process that quickly 
recovers the smallest configuration. 

We have shown that among the small-world networks 
having a fixed number of shortcuts, the average path 
length is smallest when there exists a single center 
through which all of the shortcuts are connected and 
shortcut nodes are uniformly distributed in the network. 
We have also shown that the average path length is al- 
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most as small when the shortcuts are connected and 
have a few centers, which was supported by the result of 
the GA simulations. Our results have important conse- 
quences in situations where the efficiency of information 
flow over a large network is required. The fact that the 
architecture of connected shortcuts with centers arises 
through genetic algorithms suggests the possibility that 
such a structure could emerge in networks in natural 
organisms (e.g., the neural network of C. elegans), al- 
though the fitness used in GA here is not necessarily re- 
lated to that of natural selection in biology. In particular, 



it provides a potential mechanism for the appearance of 
highly connected nodes while keeping high clustering in 
networks that are evolving but not necessarily growing, 
such as neural and metabolic networks. 
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